已发现分段模型容易受到针对性和非靶向的对抗攻击。然而,所产生的分割输出通常会损坏,即易于发现攻击。在本文中,我们提出了语义隐秘的对抗性攻击,可以在同时保留非针对性标签的同时操纵有针对性的标签。一个挑战是在数据集和模型中进行语义上有意义的操纵。另一个挑战是避免损害非目标标签。为了解决这些挑战,我们将每个输入映像视为生成扰动的先验知识。我们还设计了一个特殊的常规器,以帮助提取功能。为了评估我们的模型的性能,我们设计了三种基本攻击类型,即“消失在上下文”,“嵌入虚假标签”和“位移目标对象”。我们的实验表明,我们的隐身对抗模型可以攻击在城市景观,马地图和BDD100K上具有相对高的成功率的分段模型。我们的框架在数据集和模型上显示了良好的经验概括。
translated by 谷歌翻译
Springs are efficient in storing and returning elastic potential energy but are unable to hold the energy they store in the absence of an external load. Lockable springs use clutches to hold elastic potential energy in the absence of an external load but have not yet been widely adopted in applications, partly because clutches introduce design complexity, reduce energy efficiency, and typically do not afford high-fidelity control over the energy stored by the spring. Here, we present the design of a novel lockable compression spring that uses a small capstan clutch to passively lock a mechanical spring. The capstan clutch can lock up to 1000 N force at any arbitrary deflection, unlock the spring in less than 10 ms with a control force less than 1 % of the maximal spring force, and provide an 80 % energy storage and return efficiency (comparable to a highly efficient electric motor operated at constant nominal speed). By retaining the form factor of a regular spring while providing high-fidelity locking capability even under large spring forces, the proposed design could facilitate the development of energy-efficient spring-based actuators and robots.
translated by 谷歌翻译
Springs can provide force at zero net energy cost by recycling negative mechanical work to benefit motor-driven robots or spring-augmented humans. However, humans have limited force and range of motion, and motors have a limited ability to produce force. These limits constrain how much energy a conventional spring can store and, consequently, how much assistance a spring can provide. In this paper, we introduce an approach to accumulating negative work in assistive springs over several motion cycles. We show that, by utilizing a novel floating spring mechanism, the weight of a human or robot can be used to iteratively increase spring compression, irrespective of the potential energy stored by the spring. Decoupling the force required to compress a spring from the energy stored by a spring advances prior works, and could enable spring-driven robots and humans to perform physically demanding tasks without the use of large actuators.
translated by 谷歌翻译
Wearable sensors for measuring head kinematics can be noisy due to imperfect interfaces with the body. Mouthguards are used to measure head kinematics during impacts in traumatic brain injury (TBI) studies, but deviations from reference kinematics can still occur due to potential looseness. In this study, deep learning is used to compensate for the imperfect interface and improve measurement accuracy. A set of one-dimensional convolutional neural network (1D-CNN) models was developed to denoise mouthguard kinematics measurements along three spatial axes of linear acceleration and angular velocity. The denoised kinematics had significantly reduced errors compared to reference kinematics, and reduced errors in brain injury criteria and tissue strain and strain rate calculated via finite element modeling. The 1D-CNN models were also tested on an on-field dataset of college football impacts and a post-mortem human subject dataset, with similar denoising effects observed. The models can be used to improve detection of head impacts and TBI risk evaluation, and potentially extended to other sensors measuring kinematics.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Since early in the coronavirus disease 2019 (COVID-19) pandemic, there has been interest in using artificial intelligence methods to predict COVID-19 infection status based on vocal audio signals, for example cough recordings. However, existing studies have limitations in terms of data collection and of the assessment of the performances of the proposed predictive models. This paper rigorously assesses state-of-the-art machine learning techniques used to predict COVID-19 infection status based on vocal audio signals, using a dataset collected by the UK Health Security Agency. This dataset includes acoustic recordings and extensive study participant meta-data. We provide guidelines on testing the performance of methods to classify COVID-19 infection status based on acoustic features and we discuss how these can be extended more generally to the development and assessment of predictive methods based on public health datasets.
translated by 谷歌翻译
In this paper, we present an adjustable-equilibrium parallel elastic actuator (AE-PEA). The actuator consists of a motor, an equilibrium adjusting mechanism, and a spring arranged into a cylindrical geometry, similar to a motor-gearbox assembly. The novel component of the actuator is the equilibrium adjusting mechanism which (i) does not require external energy to maintain the equilibrium position of the actuator even if the spring is deformed and (ii) enables equilibrium position control with low energy cost by rotating the spring while keeping it undeformed. Adjustable equilibrium parallel elastic actuators resolve the main limitation of parallel elastic actuators (PEAs) by enabling energy-efficient operation at different equilibrium positions, instead of being limited to energy-efficient operation at a single equilibrium position. We foresee the use of AE-PEAs in industrial robots, mobile robots, exoskeletons, and prostheses, where efficient oscillatory motion and gravity compensation at different positions are required.
translated by 谷歌翻译
Text-guided image editing can have a transformative impact in supporting creative applications. A key challenge is to generate edits that are faithful to input text prompts, while consistent with input images. We present Imagen Editor, a cascaded diffusion model built, by fine-tuning Imagen on text-guided image inpainting. Imagen Editor's edits are faithful to the text prompts, which is accomplished by using object detectors to propose inpainting masks during training. In addition, Imagen Editor captures fine details in the input image by conditioning the cascaded pipeline on the original high resolution image. To improve qualitative and quantitative evaluation, we introduce EditBench, a systematic benchmark for text-guided image inpainting. EditBench evaluates inpainting edits on natural and generated images exploring objects, attributes, and scenes. Through extensive human evaluation on EditBench, we find that object-masking during training leads to across-the-board improvements in text-image alignment -- such that Imagen Editor is preferred over DALL-E 2 and Stable Diffusion -- and, as a cohort, these models are better at object-rendering than text-rendering, and handle material/color/size attributes better than count/shape attributes.
translated by 谷歌翻译
Deep learning classifiers provide the most accurate means of automatically diagnosing diabetic retinopathy (DR) based on optical coherence tomography (OCT) and its angiography (OCTA). The power of these models is attributable in part to the inclusion of hidden layers that provide the complexity required to achieve a desired task. However, hidden layers also render algorithm outputs difficult to interpret. Here we introduce a novel biomarker activation map (BAM) framework based on generative adversarial learning that allows clinicians to verify and understand classifiers decision-making. A data set including 456 macular scans were graded as non-referable or referable DR based on current clinical standards. A DR classifier that was used to evaluate our BAM was first trained based on this data set. The BAM generation framework was designed by combing two U-shaped generators to provide meaningful interpretability to this classifier. The main generator was trained to take referable scans as input and produce an output that would be classified by the classifier as non-referable. The BAM is then constructed as the difference image between the output and input of the main generator. To ensure that the BAM only highlights classifier-utilized biomarkers an assistant generator was trained to do the opposite, producing scans that would be classified as referable by the classifier from non-referable scans. The generated BAMs highlighted known pathologic features including nonperfusion area and retinal fluid. A fully interpretable classifier based on these highlights could help clinicians better utilize and verify automated DR diagnosis.
translated by 谷歌翻译
The last several years have witnessed remarkable progress in video-and-language (VidL) understanding. However, most modern VidL approaches use complex and specialized model architectures and sophisticated pretraining protocols, making the reproducibility, analysis and comparisons of these frameworks difficult. Hence, instead of proposing yet another new VidL model, this paper conducts a thorough empirical study demystifying the most important factors in the VidL model design. Among the factors that we investigate are (i) the spatiotemporal architecture design, (ii) the multimodal fusion schemes, (iii) the pretraining objectives, (iv) the choice of pretraining data, (v) pretraining and finetuning protocols, and (vi) dataset and model scaling. Our empirical study reveals that the most important design factors include: temporal modeling, video-to-text multimodal fusion, masked modeling objectives, and joint training on images and videos. Using these empirical insights, we then develop a step-by-step recipe, dubbed VindLU, for effective VidL pretraining. Our final model trained using our recipe achieves comparable or better than state-of-the-art results on several VidL tasks without relying on external CLIP pretraining. In particular, on the text-to-video retrieval task, our approach obtains 61.2% on DiDeMo, and 55.0% on ActivityNet, outperforming current SOTA by 7.8% and 6.1% respectively. Furthermore, our model also obtains state-of-the-art video question-answering results on ActivityNet-QA, MSRVTT-QA, MSRVTT-MC and TVQA. Our code and pretrained models are publicly available at: https://github.com/klauscc/VindLU.
translated by 谷歌翻译